Human Endogenous Retrovirus (HERV) Transcriptome Is Dynamically Modulated during SARS-CoV-2 Infection and Allows Discrimination of COVID-19 Clinical Stages

ABSTRACT SARS-CoV-2 infection is known to trigger an important inflammatory response, which has a major role in COVID-19 pathogenesis. In infectious and inflammatory contexts, the modulation of human endogenous retroviruses (HERV) has been broadly reported, being able to further sustain innate immune responses due to the expression of immunogenic viral transcripts, including double-stranded DNA (dsRNA), and eventually, immunogenic proteins. To gain insights on this poorly characterized interplay, we performed a high-throughput expression analysis of ~3,300 specific HERV loci in the peripheral blood mononuclear cells (PBMCs) of 10 healthy controls and 16 individuals being either convalescent after the infection (6) or retesting positive after convalescence (10) (Gene Expression Onmibus [GEO] data set GSE166253). Results showed that the exposure to SARS-CoV-2 infection modulates HERV expression according to the disease stage and reflecting COVID-19 immune signatures. The differential expression analysis between healthy control (HC) and COVID-19 patients allowed us to identify a total of 282 differentially expressed HERV loci (deHERV) in the individuals exposed to SARS-CoV-2 infection, independently from the clinical form. In addition, 278 and 60 deHERV loci that were specifically modulated in individuals convalescent after COVID19 infection (C) and patients that retested positive to SARS-CoV-2 after convalescence (RTP) as individually compared to HC, respectively, as well as 164 deHERV loci between C and RTP patients were identified. The identified HERV loci belonged to 36 different HERV groups, including members of all three classes. The present study provides an exhaustive picture of the HERV transcriptome in PBMCs and its dynamic variation in the presence of COVID-19, revealing specific modulation patterns according to the infection stage that can be relevant to the disease clinical manifestation and outcome. IMPORTANCE We report here the first high-throughput analysis of HERV loci expression along SARS-CoV-2 infection, as performed with peripheral blood mononuclear cells (PBMCs). Such cells are not directly infected by the virus but have a crucial role in the plethora of inflammatory and immune events that constitute a major hallmark of COVID-19 pathogenesis. Results provide a novel and exhaustive picture of HERV expression in PBMCs, revealing specific modulation patterns according to the disease condition and the concomitant immune activation. To our knowledge, this is the first set of deHERVs whose expression is dynamically modulated across COVID-19 stages, confirming a tight interplay between HERV and cellular immunity and revealing specific transcriptional signatures that can have an impact on the disease clinical manifestation and outcome.

. Sample to sample distance (A) and PCA plot (B) of~60700 cellular genes (A) Heatmaps of the overall similarity between samples: correlation distance measure has been used in clustering columns based on the rlog-normalized expression data of the Gencode v34 set of~60700 cellular genes. Distance values are blue scaled, as represented in the color key and histogram legends. (B) Principal Component Analysis on rlog-normalized cellular genes' expression data. Samples are annotated by condition: red, healthy controls; green, convalescent after recover from SARS-CoV-2 infection; blue, retesting positive after convalescence. Supplementary Figure S4. Gene Ontology analysis of de+ genes colocalized with de+ HERVs in the three subcomparisons GO analysis was performed considering a total of 8, 48, and 23 de-genes that were upregulated along with the colocalized deHERVs in RTP vs HC, C vs HC, and RTP vs C subcomparisons, respectively. **** p=7.9-5 **** p=5.  Figure S5. Boxplot of expression levels for the key deHERVs modulated in all conditions that are not colocalized with de-genes The expression levels (as Transcripts Per Million kilobases values, TPM) of the 14 out of 31 key deHERVs that i) are not co-localized with differentially expressed genes and ii) show TPM >2,5 in at least one condition (14/31) were plotted in the different conditions according to their magnitude of modulation: (A) deHERV down-regulated in SARS-CoV-2 exposed individuals (12/14), and (B) deHERV up-regulated in SARS-CoV-2 exposed individuals (2/14). Plots marked with a star are correspond to the deHERVs with the highest expression (log10TPM >10 for at least one condition). Statistics is based on t-test.

B B
Supplementary Figure S6. Heatmap of the variance of expression for the subset of cellular genes involved in innate immunity (A) Hierarchical clustering of a set of about 1100 cellular genes known to be involved in human innate immune response (as reported in InnateDB). The immune genes are in rows, and the samples are in columns. rlognormalized counts are color-scaled from blue (minimum) to red (maximum). The heatmap has been compared to the one as obtained with the 500 HERVs sorted by the highest variance of expression among conditions (B) (already included in Figure 2, panel B). Correlation distance measure has been used in clustering columns. Samples are annotated by condition: red, healthy controls; green, convalescent after recover from SARS-CoV-2 infection; blue, re-testing positive after convalescence.
Supplementary Figure S7. Representation of the most relevant transcripts mapping to the key 31 deHERVs HERV transcripts were reconstructed de novo using the Trinity pipeline from raw reads of the different HERV loci in the three conditions and are visualized at the correspondent position in the human genome with IGV software. Most relevant transcripts are included in a red square. Annotations for cellular genes ("Gene") and RepeatMasker classes of repetitive elements ("LINE", "SINE", and "LTR") were also activated from IGV tracks and are reported at the bottom of each panel. Overlap of the indicated transcripts with gene exons is highlighted in yellow boxes, indicating potential chimeric transcripts.

Supplementary Tables
Supplementary Table S1. Characteristics of the RNAseq dataset used for the study The table reports each sample ID with the corresponding information, including raw fastq files reads' count and GC content, percentage and number of mapped reads by STAR alignment to hg38 assembly, percentage and number of reads counted at cellular genes and HERV genomic coordinates.

Supplementary Table S2. List of 282 deHERVs modulated in the PBMC of SARS-CoV-2 exposed individuals
The reported deHERVs loci were identified comparing together C and RTP individuals to HC with the package DEseq2m, setting as a threshold an absolute Log2FoldChange ≥ 1 and an adjusted p-value ≤ 0.01. Table S3. Cellular genes co-localized with deHERVs modulated in the PBMC of SARS-CoV-2 exposed individuals The information of genes colocalized with deHERVs (including genomic coordinates, strand, exon count, and protein coding capacity) are reported along with the results of differential expression analysis in the same conditions (C and RTP vs HC).

Supplementary
Supplementary Table S4. Focus on the 50 de-genes co-localized with deHERVs modulated in the PBMC of SARS-CoV-2 exposed individuals Results of differential expression analyses for deHERVs and co-localized de-genes are reported, indicating whether they show a concomitant modulation (in 49 out of 50 cases).

Supplementary Table S5. Results of HERV differential expression analyses in the different subcomparisons
After the overall analysis, the individual conditions have been pairwise compared in dedicated differential expression analyses of C vs HC (571 deHERVs), RTP vs HC (282 deHERVs), and RTP vs C (164 deHERVs). Modulation results are always referred to the first condition as compared to the second one.
Supplementary Table S6. Gene ontology of upregulated de-genes colocalized with upregulated de-HERV in the different subcomparisons Based on the results reported in the previous table (Table S5), we identified genes co-localized with upregulated deHERVs and being themselves upregulated and performed a gene enrichment analysis using the database of GO biological processes on modEnrichr suite (https://maayanlab.cloud/modEnrichr/).

Supplementary Table S7. Expression levels of the 31 deHERVs modulated in all conditions
A total of 31 deHERVs found to be modulated in all subcomparisons (C vs HC, RTP vs HC, and RTP vs C) were further characterized by calculating their expression in the different samples as Transcripts per Million Kilobases (TPM).

Supplementary Table S8. De novo reconstruction of deHERVs putative transcripts
The 31 deHERVs found to be modulated in all conditions (Table S7) have been filtered based on a threshold of mean TPM > 2.5, selecting 12 deHERVs that have been characterized for their transcript production potential after the de novo reconstruction of C, RTP, and HC transcriptomes with the tool Trinity.